Faster Inversion and Other Black Box Matrix Computations 

Using Efficient Block Projections 



Wayne Eberly 1 , Mark Giesbrecht 2 , Pascal Giorgi 24 , Arne Storjohann 2 , Gilles Villard 3 

(1) Department of Computer Science, U. Calgary 

http : //pages . cpsc .ucalgary . ca/~ eberly 

(2) David R. Cheriton School of Computer Science, U. Waterloo 

http : //www.uwaterloo . ca/~{mwg, pgiorgi , astorjoh} 

(3) CNRS, LIP, Ecole Normale Superieure de Lyon 
http : //per so . ens-lyon . fr /gilles . villard 

(4) IUT - Universite de Perpignan 

http : //webdali .univ-perp . f r/~pgiorgi 



ABSTRACT 

Efficient block projections of non-singular matrices have re- 
cently been used, in [11], to obtain an efficient algorithm to 
find rational solutions for sparse systems of linear equations. 
In particular a bound of 0~(n 2 ' 5 ) machine operations is ob- 
tained for this computation assuming that the input matrix 
can be multiplied by a vector with constant-sized entries 
using 0~(n) machine operations. Somewhat more general 
bounds for black-box matrix computations are also derived. 
Unfortunately, the correctness of this algorithm depends on 
the existence of efficient block projections of non-singular 
matrices and this has been conjectured but not proved. 

In this paper we establish the correctness of the algo- 
rithm from [11] by proving the existence of efficient block 
projections for arbitrary non-singular matrices over suffi- 
ciently large fields. We further demonstrate the usefulness of 
these projections by incorporating them into existing black- 
box matrix algorithms to derive improved bounds for the 
cost of several matrix problems — considering, in particu- 
lar, "sparse" matrices that can be be multiplied by a vector 
using 0~(n) field operations: We show how to compute the 
dense inverse of a sparse matrix over any field using an ex- 
pected number of 0~(n 2,27 ) operations in that field. A basis 
for the null space of a sparse matrix — and a certification of 
its rank — are obtained at the same cost. An application of 
this technique to Kaltofen and Villard's Baby-Steps/Giant- 
Steps algorithms for the determinant and Smith Form of an 
integer matrix is also sketched, yielding algorithms requiring 
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0~(n 2 ' 66 ) machine operations. More general bounds involv- 
ing the number of black-box matrix operations to be used 
are also obtained. 

The derived algorithms are all probabilistic of the Las 
Vegas type. That is, they are assumed to be able to generate 
random elements from the field at unit cost, and always 
output the correct answer in the expected time given. 

1. INTRODUCTION 

In our paper [11] we presented an algorithm which pur- 
portedly solved a sparse system of rational equations consid- 
erably more efficiently than standard linear equations solv- 
ing. Unfortunately, its effectiveness in all cases was conjec- 
tural, even as its complexity and actual performance were 
very appealing. This effectiveness relied on a conjecture re- 
garding the existence of so-called efficient block projections. 
Given a matrix A £ f nxn over arl y field F, these projections 
should be block vectors u £ f nxs (where s is a blocking fac- 
tor dividing n, so n = ms) such that we can compute uv or 
v u quickly for any v £ F™ x s , and such that the sequence of 
vectors it, Au, . . . , A m ~ 1 u has rank n. 

In this paper, we prove the existence of a class of such 
efficient block projections for non-singular n x n matrices 
over sufficiently large fields — we require that the size of 
the field F exceed n(n + 1). This implies our algorithm 
from [11] for finding the solution to A~ 1 b for a "sparse" sys- 
tem of equations A G Z nxn and b € Z' IX1 works as stated 
using these projections, and requires fewer bit operations 
than any previously known when A is sparse, at least using 
"standard" (i.e., cubic) matrix arithmetic. Here, by A be- 
ing sparse, we mean that it has a fast matrix- vector product 
modulo any small (machine-word size) prime p. In partic- 
ular, our algorithm requires an expected 0~(n ,5 (log(||A|| + 
||6||))) matrix- vector products v i— » Av mod p for v £ Z" xl 
plus and additional 0~(n 2,5 (log(||yl|| + log ||b||))) bit opera- 
tions. The algorithm is probabilistic of the Las Vegas type. 
That is, it assumes the ability to generate random bits at 
unit cost, and always returns the correct answer with con- 
trollably high probability. When <j)( n ) = 0~(n), the implied 
cost of 0~(n 2 5 ) bit operations improves upon the p-adic lift- 
ing method of [7] which requires <J~{n 3 ) bit operations for 
sparse or dense matrices. This theoretical efficiency was re- 



fleeted in practice in [11] at least for large matrices. 

We present several other rather surprising applications 
of this technique. Each incorporates the technique into an 
existing algorithm in order to reduce the asymptotic com- 
plexity for the sparse matrix problem to be solved. In par- 
ticular, given a matrix A G F nxn over an arbitrary field 
F, we are able to compute the complete inverse of A with 
0~(n 3 ~ 1 / ( ' t ' J ~ 1 ^) operations in F plus O^n 2-1 ^" -1 )) matrix- 
vector products by A. Here lj is such that we can multiply 
two n x n matrices with 0(n w ) operations in F. Standard 
matrix multiplication gives w = 3, while the best known 
matrix multiplication of Coppersmith & Winograd [6] has 
lj = 2.376. If again we can compute v i— » Av with 0~(n) 
operations in F, this implies an algorithm to compute the 



inverse with 0~(* 



,3-1/(^-1)1 



operations in F. This is always 



less than 0{n w ), and in particular equals 0~(n ' ) oper- 
ations in F for the best known to of [6]. Other relatively 
straightforward applications of these techniques yield algo- 
rithms for the full nullspace and (certified) rank with this 
same cost. Finally, we sketch how these methods can be 
employed in the algorithms of Kaltofen and Villard [20] and 
[14] to computing the determinant and Smith form of sparse 
matrices more efficiently. 

There has certainly been much important work done on 
finding exact solutions to sparse rational systems prior to 
[11]. Dixon's p-adic lifting algorithm [7] performs extremely 
well in practice for dense and sparse linear systems, and 
is implemented efficiently in LinBox [8], Maple and Magma 
(see [11] for a comparison). Kaltofen & Saunders [18] are the 
first to propose to use Krylov-type algorithms for these prob- 
lems. Krylov-type methods are used to find Smith forms of 
sparse matrices and to solve Diophantine systems in paral- 
lel in [13, 14], and this is further developed in [20, 9]. Sec 
the references in these papers for a more complete history. 
For sparse systems over a field, the seminal work is that 
of Wiedemann [23] who shows how to solve sparse n x n 
systems over a field with 0(n) matrix- vector products and 
0(n ) other operations. This research is further developed 
in [18, 5, 17] and many other works. The bit complexity of 
similar operations for various families of structure matrices 
is examined in [12]. 

2. SPARSE BLOCK GENERATORS FOR 
THE KRYLOV SPACE 

For now we will consider an arbitrary invertible matrix 
A G F nxn over a field F, and s an integer — the blocking 
factor — which divides n exactly. Let m — n/s. For a so- 
called block projection u G F nxs and 1 < k < m, we denote 
by fCk(A,u) the block Krylov matrix [it, Au, . . . , A k ~ L u] G 
pnxfes Qur gQal ig tQ ghow that )C m (A,u) G F nxn is non- 
singular for a particularly simple and sparse u, assuming 
some properties of A. 

Our factorization uses the special projection (which we 
will refer to as an efficient block projection) 



G F" 



(2.1) 



which is comprised of m copies of I s and thus has exactly 
n non-zero entries. A similar projection has been suggested 
in [11] without proof of its reliability (i.e., that the corre- 
sponding block Krylov matrix is non-singular). We establish 



here that it does yield a block Krylov matrix of full rank, 
and hence can be used for an efficient inverse of a sparse A. 

Let V = diag(d)i, ... ,61,62, ... ,62, ... ,6 m , ■■■ ,5 m ) be an 
n x n diagonal matrix whose entries consist of m distinct 
indeterminates Si, each Si occurring s times. 

Theorem 2.1. If the leading ks x ks minor of A is non- 
zero for 1 < k < m, then K, m (VAV, u) G F nxn is invertible. 

Proof. Let B = VAV. For 1 < k < m, define B k as 
the specialization of B obtained by setting 5 k+1 , S k+2 , . . . ,6 m 
to zero. Then Bk is the matrix constructed by setting to 
zero the last n — ks rows and columns of B. Similarly, for 
1 < k < m we define Uk G F nxs to be the matrix constructed 
from u by setting to zero the last n — ks rows. In particular 
we have B m = B and u m = u. 

We proceed by induction on k and show that 



rank K.k(B k ,u k ) 



ks, 



(2.2) 



for 1 < k < m. For the base case k = 1 we have tCi (Bi , ui ) = 
Mi and thus rank /Ci (Si , ui) = rankwi = s. 

Now, assume that (2.2) holds for some k with 1 < k < m. 
By the definition of Bk and Uk, only the first ks rows of 
Bk and Uk will be involved in the left hand side of (2.2). 
Similarly, only the first ks columns of Bk will be involved. 
Since by assumption on B the leading ks x ks minor is non- 
zero, we have rank BklCk(Bk,u k ) = ks, which is equivalent 
to rank ICk(Bk,BkUk) = ks. By the fact that the first ks 
rows of Uk+i — Uk are zero, we have Bk(uk+i — Uk) = 0, or 
equivalently BkUk+i = BkUk, and hence 



rankK,k(Bk, BkUk+i) = ks. 
The matrix in (2.3) can be written as 



(2.3) 



K,k(Bk,BkUk+\) 







G F" 



where Mk G F sx s is nonsingular. Introducing the block 
u k +i we obtain the following matrix: 



[u k+ i,ICk(Bk,BkUk + i)] 



* 

Is 





M k 





(2.4) 



whose rank is (k + l)s. Noticing that 



Uk+i,tCk(Bk,BkUk+i) 



K-k+i {Bk, Mfc+i) 



we are led to 



rank/C fc+ i(Bfe,u fe+ i) ={k+l)s. 

Finally, using the fact that Bk is the specialization of Bk+i 
obtained by setting 8 k +i to zero, we obtain 

rank/Cfc + i(Bfc+i,Ufc+i) = (k + l)s, 

which is (2.2) for k + 1 and thus establishes the theorem by 
induction. □ 

If the leading ks x ks minor of A is nonzero then the 
leading ks x ks minor of A T is nonzero as well, for any 
integer k. 



Corollary 2.2. If the leading ks x ks minor of A is 
nonzero for 1 < k < m and B = VAV, then JC m (B T ,u) G 
F nxn is invertible. 



Suppose now that A G F nxn is an arbitrary non-singular 
matrix and the size of F exceeds n(n +1). It follows by 
Theorem 2 of Kaltofen and Saunders [18] that there exists 
a lower triangular Toeplitz matrix L € F nxn and an upper 
triangular Toeplitz matrix U G f nxn such that each of the 
leading minors of A = UAL is nonzero. Let B = T>AT>; then 
the products of the determinants of the matrices K m (B,u) 
and JC m (B T , u) (mentioned in the above theorem and corol- 
lary) is a polynomial with total degree less than 2n(m — 1) < 
n(n+ 1) (if m/1). In this case it follows that there is also a 
non-singular diagonal matrix D G f nxn such that K. m (B,u) 
and K, m {B T ,u) are non-singular, for 

B = DAD = DUALD. 

Now let R = LD 2 U G F nxn , u G F sxn and v G F nxs such 
that 

u T — (L T )~ 1 D~ 1 u and v = LDu. 

Then 



and 



/C m (iL4,v) = LDK. m {B,u) 



L T DK, m ((RA) T ,u T ) = K, m (B T ,u), 



so that /C m (iL4,t>) and /C m ((RA) T , w T ) are each non-singular 
as well. Since -D is diagonal and U and L are triangular 
Toeplitz matrices it is now easily established that (R, u, v) 
is an "efficient block projection" for the given matrix A, 
where this is as defined in [11]. 

This proves Conjecture 2.1 of [11] for the case that the 
size of F exceeds n(n + 1): 

Corollary 2.3. For any non-singular A G F nxn and 
s | n (over a field of size greater than n(n + I)) there exists 
an efficient block projection (R,u,v) G F nxn x F sxn x F nxs . 

3. FACTORIZATION OF THE MATRIX IN- 
VERSE 

The existence of the efficient block projection established 
in the previous section allows us to define a useful factor- 
ization of the inverse of a matrix. This was used to obtain 
faster heuristics for solving sparse integer matrices in [11]. 
The basis is the following factorization of the matrix inverse. 

Let B — T>AT>, where P is an n x n diagonal matrix 
whose diagonal entries consist of m distinct indeterminates, 
each occuring s times contiguously, as previously. Define 
Ku = K m {B, u) with u as in (2.1) and /cl = K, m (B T , u) T 
(where (r) and (£) refer to projection on the right and left 
respectively). For any < k < m — 1 and any two indices 
I and r such than I + r — k we have u T B l ■ B r u = u T B k u. 
Hence the matrix TL U = ■ B ■ tC^ is block-Hankel with 
blocks of dimension s x s: 

u T Bu u T B 2 u ... u T B m u 



Hu = 



u T B2u 
u T B m u 



u T B 3 u 



u B 



G F" 



From H u = /c£° ■ B ■ ICi r> = ■ VAV ■ JC ( u r) , we have 
following corollary to Theorem 2.1, which (together with 



Corollary 2.2) implies that fCu^ and fCu \ and thus H u are 
invertible. 

Corollary 3.1. If A e F" x " is such that all leading 
ks x ks minors are non-singular, V is a diagonal matrix 
of indeterminates and B = VAT), then B^ 1 and A~ x may 
be factored as 



5-1 ^W-w-i^W 
A' 1 ^KS^n-^V, 



(3.1) 



where /C« ^ and ICu ^ are block-Krylov as defined above, and 
H u G F nxn is block-Hankel (and invertible) with sx s blocks, 
as above. 

We note that for any specialization of the indeterminates in 
T> to field elements in F such that det 7i u / Owe get a similar 
formula to (3.1) completely over F. A similar factorization 
in the non-blocked case is used in [10, (4.5)] for fast parallel 
matrix inversion. 

4. BLACK-BOX MATRIX INVERSION OVER 
A FIELD 

Let A G F nxn be invertible and such that for any v G F nxl 
the matrix times vector product Av or A T v can be com- 
puted in 4>(n) operations in F (where <fr(n) > n). Follow- 
ing Kaltofen, we call such matrix-vector and vector-matrix 
products black-box evaluations of A. In this section we will 
show how to compute A" 1 with O r "(n 2-1 ^'" -1 ' ) ) black box 



evaluations and additional 0~(n 



3-l/(w-lh 



operations in F. 



Note that when </>(n) = 0~(n) — a common characteriza- 
tion of "sparse" matrices — the exponent in n of this cost 
is smaller than w, and is 0~(n 2 ' 273 ) with the currently best- 
known matrix multiplication. 

We assume for the moment that all principal ks x ks mi- 
nors of A are non-zero, 1 < k < m. 

Let 81,62, ... ,8 m be the indeterminates which form the 
diagonal entries of T> and let B = T>AT>. By Theorem 2.1 
and Corollary 2.2, the matrices K m {B,u) and tC m {B T ,u) 
are each invertible. If m > 2 then the product of the 
determinants of these matrices is a non-zero polynomial 
A G F[8i, . . . , 6m] with total degree less than 2n(m — 1) — 1. 

Suppose that F has at least 2n(m — 1) elements. Then A 
cannot be zero at all points in (F \ {0}) n . Let di, <fe, . . . , d m 
be nonzero elements of F such that A(di, di, ■ ■ ■ , d m ) 7^ 0, 
let D = diag(di, . . . , di, . . . , d m , . . . , d m ), and let B — DAD. 
Then K u r) = fC m (B,u) G F nxn and K u l) = K, m (B T ,u) T G 
pnxn are gg^jj i nver tible since A(di, d%, ■ ■ ■ , d m ) 0, B is 
invertible since A is and di, di, . . . , d m are all nonzero, and 
thus H u = K u e) BK u r) G F nxn is invertible as well. Corre- 
spondingly, (3.1) suggests 

B- 1 = kPh^kP, and A' 1 = DK^H^K^D 



for computing the matrix inverse. 



,(<) 



1. Computation of u T , u T B, u T B 2 " 1 ' 1 and K\' 

(r) 

We can compute this sequence, and hence K u with 
m— 1 applications of B to vectors using 0(n(j>(n)) op- 
erations in F. 

2. Computation of H u . 

Due to the special form (2.1) of u, one may then com- 
pute wu for any w G F sxn with O(sn) operations. 



Hence we can now compute u T B t u for < i < 2m — 1 
with 0(n 2 ) operations in F. 

3. Computation of H^ 1 . 

The off-diagonal inverse representation of H^ 1 as in 
(A. 4) in the Appendix can be found with 0~(s u 'm) op- 
erations by Proposition A.l. 

4. Computation of H~ 1 K^' '. 

From Corollary A. 2 in the Appendix, we can compute 
the product H~ 1 M for any matrix M £ F nxn with 
0~(s"m 2 ) operations. 

5. Computation of K ( u r) ■ {H^K^). 

We can compute K { J ] M = [it, Bit, ... , B m ~ 1 u]M, for 
any M G f nxn by splitting M into m blocks of s 
consecutive rows Mi, for < i < m — 1: 

m— 1 

1=0 (4.1) 
=uM + B(uMi + B(uM 2 H 

■ ■ ■ + B(uM m - 2 + BuMm-l) ■■■). 

Because of the special form (2.1) of it, each product 
uMi £ F nxn requires 0(n 2 ) operations, and hence all 
such products involved in (4.1) can be computed in 
0(mn 2 ) operations. Since applying B to an n x n 

matrix costs n<j>(n) operations, Ku^M is computed in 
0(mn<f>(n) + mn 2 ) operations using the iterative form 
of (4.1) 

In total, the above process requires 0(mn) applications 
of A to a vector (the same as for B), and 0(s"m 2 + mn 2 ) 
additional operations. If <j)(n) = 0~(n) the overall number 
of field operations is minimized with the blocking factor s — 

Theorem 4.1. Let A e p"- x "- ; where n = ms, be such 
that all leading ks x ks minors are non-singular for 1 < k < 
m. Let B = DAD for D = diag(di, . . . , di, . . . , d m , • • • , d m ) 
such that di, di, . . . , d m are nonzero and each of the matrices 
K, m {DAD,u) and ICm{{DAD) T ,u) is invertible. Then the 
inverse matrix A^ 1 can be computed using 0(n 2_1 ^ w_1 ') 
applications of A to vectors and an additional CT(n 3_1/ '' a ' _1 ') 
operations in F. 

The above discussion makes a number of assumptions. 

First, it assumes that the blocking factor s exactly divides 
n. This is easily accommodated by simply extending n to 
the nearest multiple of s, placing A in the top left corner 
of the augmented matrix, and adding diagonal ones in the 
bottom right corner. 

Theorem 4.1 also makes the assumptions that all the lead- 
ing ks x ks minors of A are non-singular and that the de- 
terminants of ?C m (DAD,u) and fC m ((DAD) T , u) are each 
nonzero. While we know of no way to ensure this determin- 
istically in the times given, standard techniques can be used 
to obtain these properties probabilistically if F is sufficiently 
large. 

Suppose, in particular, that n > 16 and that #F > 
2(m + l)n[log 2 n] . Fix a set 5 of at least 2(m + l)n[~log 2 n\ 
nonzero elements of F. We can ensure that the leading 
ks x ks minors of A are non-zero by pre-multiplying by a 
butterfly network preconditioner U with parameters chosen 



uniformly and randomly from S. If U is constructed us- 
ing the generic exchange matrix of [5, §6.2], then it will use 
at most n[log 2 n]/2 random elements from S, and from [5, 
Theorem 6.3] it follows that all leading ks x ks minors of 
A = UA will be non-zero simultaneously with probability 
at least 3/4. This probability of success can be made arbi- 
trarily close to 1 with a choice from a larger S. We note 
that A' 1 = A^U. Thus, once we have computed A we 
can compute A^ 1 with an additional 0~(n 2 ) operations in 
F, using the fact that multiplication of an arbitrary n x n 
matrix by an n x n butterfly preconditioner can be done 
with CT(n 2 ) operations. 

Once again let A be the products of the determinants 
of the matrices K, m {VAV, it) and IC m ((T>AV) T , u), so that 
A is nonzero with total degree less than 2n(m — 1). If we 
choose randomly selected values from <S for Si , . . . , S m , since 
#5 > 2(m + l)n[log 2 n] > 4deg A, the probability that A 
is zero at this point is at most 1/4 by the Schwartz-Zippel 
Lemma [22, 24]. 

In summary, for randomly selected butterfly precondi- 
tioner B, and independently and randomly chosen values 
di,d2, . . . , d m the probability that A — UA has non-singular 
leading ks x ks minors for 1 < k < m and A(di, d-z, . . . , d m ) 
is non-zero is at least 9/16 > 1/2 when random choices are 
made uniformly and independently from a finite subset S of 
F \ {0} with size at least 2(m + l)n[log 2 n] . 

When #F < 2(m + l)n[log 2 n] we can easily construct 
a field extension E of F which has size greater than 2(m + 
l)n[log 2 n] and perform the computation in that extension. 
Since this extension will have degree 0(log #F n) over F, it 
will add only a logarithmic factor to the final cost. While we 
certainly do not claim that this is not of practical concern, 
it does not affect the asymptotic complexity. 

Finally, we note that it is easily checked whether the ma- 
trix returned by this algorithm is the inverse of the input by 
using n multiplications by A by the columns of the output 
matrix and corresponding each to the corresponding column 
of the identity matrix. This results in a Las Vegas algorithm 
for computation of the inverse of a black-box matrix with 
the cost as given above. 

Theorem 4.2. Let A e F nxn be non-singular. Then the 
inverse matrix A^ 1 can be computed by a Las Vegas algo- 
rithm whose expected cost is 0~(n 2 ~ 1, ' l ' UJ ~ 1 * > ) applications of 
A to a vector and 0~(n 3_1 ^" -1 ') additional operations in F. 





Black-box 
applications 


Blocking 
factor s 


Inversion 
cost 


3 (Standard) 


1.5 


1/2 


0~(n 25 ) 


2.807 (Strassen) 


1.446 


0.553 


<X(n 2 ' 446 ) 


2.3755 (Cop/Win) 


1.273 


0.728 


o> 2 - 273 ) 



Table 4.1: Exponents of matrix inversion with a ma- 
trix x vector cost 4>(n) — 0~(n). 



Remark 4.3. The structure (2.1) of the projection u plays 
a central role in computing the product of the block Krylov 
matrix by anxn matrix. For a general projection it G F nxs , 
how to do better than a general matrix multiplication, i.e., 



how to take advantage of the Krylov structure for computing 
K U M, appears to be unknown. 

Applying a Black-Box Matrix Inverse to a Ma- 
trix 

The above method can also be used to compute A~ M for 
any matrix M G p nxn w ith the same cost as in Theorem 4.2. 
Consider the new step 1.5: 

1.5. Computation of K { u e) ■ M. 

Split M into m blocks of s columns, so that M — 
[M , . . . , M m _i] where M k G F nxs . Now consider com- 
puting ■ Mk for some k € {0, . . . , m — 1}. This can 
be accomplished by computing B l M k for < i < m— 1 
in sequence, and then multiplying on the left by u T to 
compute u T B t M k for each iterate. 

The cost for computing M k for a single k by the 
above process is n — s multiplication of A to vectors 
and 0(ns) additional operations in F. The cost of 
doing this for all k such that < k < m — lis thus 
m(n — s) < nm multiplications of A to vectors and 
0(n 2 ) additional operations. Since applying A (and 
hence B) to an n x n matrix is assumed to cost n<f>(n) 
operations in F, K$P ■ M is computed in 0{mncj){n) + 
mn 2 ) operations in F by the process described here. 

Note that this is the same as the cost of Step 5, so the 
overall cost estimate is not affected. Since Step 4 does not 

(£) 

rely on any special form for Ku , we can replace it with 
a computation of H^ 1 ■ {K^p M) with the same cost. We 
obtain the following: 

Corollary 4.4. Let A G F nxn be non-singular and let 
M G f nxn b e an y matrix. We can compute A~ X M using a 
Las Vegas algorithm whose expected cost is 0~(n 2 ~ 1, ' l ' UJ ~ 1 ^) 
multiplications of A to vectors andO~(n 3 ~ 1 / ( - u '~ 1 ^) additional 
operations in F. 

The estimates in Table 4 apply to this computation as 
well. 

5. APPLICATIONS TO BLACK-BOX MA- 
TRICES OVER A FIELD 

The algorithms of the previous section have applications 
in some important computations with black-box matrices 
over an arbitrary field F. In particular, we consider the 
problems of computing the nullspace and rank of a black- 
box matrix. Each of these algorithms is probabilistic of the 
Las Vegas type. That is, the output is certified to be correct. 

Kaltofen & Saunders [18] present algorithms for comput- 
ing the rank of a matrix and for randomly sampling the 
nullspace, building upon work of Wiedemann [23]. In par- 
ticular, they show that for random lower upper and lower 
triangular Toeplitz matrices U, L G F™ x n , and random di- 
agonal D, that all leading k x k minors of A — UALD are 
non-singular for 1 < k < r = rank A, and that if f A G F[x] is 
the minimal polynomial of A, then it has degree r + 1 if A is 
singular (and degree n if A is non-singular). This is proven 
to be true for any input A G p nxn ) an(1 f or random choice of 
U, L and D, with high probability. The cost of computing 
f A (and hence rank A) is shown to be 0(n) applications of 



the black-box for A and 0(n 2 ) additional operations in F. 
However, no certificate is provided that the rank is correct 
within this cost (and we do not know of one or provide one 
here). Kaltofen & Saunders [18] also show how to generate 
a vector uniformly and randomly from the nullspace of A 
with this cost (and, of course, this is certifiable with a single 
evaluation of the black box for A). We also note that the 
algorithms of Wiedemann and Kaltofen & Saunders require 
only a linear amount of extra space, which will not be the 
case for our algorithms. 

We first employ the random preconditioning of [18] : A — 
UALD as above. We will thus assume in what follows that 
A has all leading i x i minors non-singular for 1 < i < r. 
While an unlucky choice may make this statement false, this 
case will be identified in our method. Also assume that we 
have computed the rank r of A with high probability. Again, 
this will be certified in what follows. 

1. Inverting the leading minor. 

Let Aq be the leading r x r minor of A and partition 
A as 

A _(A AA 
A -\A 2 A 3 ) 

Using the algorithm of the previous section, compute 
Aq 1 . If this fails, then the randomized conditioning or 
the rank estimate has failed and we either report this 
failure or try again with a different randomized pre- 
conditioning. If we can compute Aq 1 , then the rank 
of A is at least the estimated r. 

2. Applying the inverted leading minor. 

Compute A^ 1 A 1 £ F rx(n ~ r) using the algorithm of 
the previous section (this could in fact be merged into 
the first step). 

3. Confirming the nullspace. 

Note that 

\A 2 Asjy -L ) - ^A-'Al-As) - U 
. ' 

JV 

and the Schur complement A2Aq 1 Ai — A3 must be 
zero if the rank r is correct. This can be checked with 
n — r evaluations of the black box for A. We note that 
because of its structure, N has rank n — r. 

4. Output rank and nullspace basis. 

If the Schur complement is zero, then output the rank 
r and N, whose columns give a basis for the nullspace 
of A. Otherwise, output fail (and possibly retry with 
a different randomized pre-conditioning). 

Theorem 5.1. Let A e F nxn have rank r. Then a basis 
for the nullspace of A and rank r of A can be computed with 
an expected number o/(T(n 2_1 ^" _1 - ) ) applications of A to a 
vector, plus an additional expected number o/(T(n 3_1 ^" _1 - ) ) 
operations in F. The algorithm is probabilistic of the Las 
Vegas type. 

6. APPLICATIONS TO SPARSE RATIONAL 
LINEAR SYSTEMS 

Given a non-singular A G Z nxn and b G Z nXl , in [11] 
we presented an algorithm and implementation to compute 



A 1 b with 0~(n 1 ' 5 (log(|| J 4|| + ||fc||))) matrix-vector products 
v i — ► A mod p for a machine-word sized prime p and any v G 
Z" xl plus(X(n 2 ' 5 (log(P|| + ||b||))) additional bit-operations. 
Assuming that A and & had constant sized entries, and that a 
matrix- vector product by A mod p could be performed with 
0~(n) operations modulo p, the algorithm presented could 
solve a system with 0~(n 2 ' 5 ) bit operations. Unfortunately, 
this result was conditional upon the unproven Conjecture 2.1 
of [11]: the existence of an efficient block projection. This 
conjecture was established in Corollary 2.3 of the current 
paper. We can now unconditionally state the following: 

Theorem 6.1. Given any invertible A € Z nxn and b G 
Z nxl , we can compute A~ 1 b using a Las Vegas algorithm. 
The expected number of matrix-vector products v i— » Av mod 
p is in (J~(n (log(\\A\\ + ||6||))), and the expected num- 
ber of additional bit- operations used by this algorithm is in 

OV' 5 (log(P|| + ll&ll))). 

Sparse integer determinant and Smith form 

The efficient block projection of Theorem 2.1 can also be 
employed relatively directly into the block baby-steps/giant- 
steps methods of [20] for computing the determinant of an 
integer matrix. This will yield improved algorithms for the 
determinant and Smith form of a sparse integer matrix. Un- 
fortunately, the new techniques do not obviously improve 
the asymptotic cost of their algorithms in the case for which 
they were designed, namely, for computations of the deter- 
minants of dense integer matrices. 

We only sketch the method for computing the determinant 
here following the algorithm in Section 4 of [20], and esti- 
mate its complexity. Throughout we assume that A G Z nxn 
is non-singular and assume that we can compute v i— > Av 
with 4>{n) integer operations, where the bit-lengths of these 
integers are bounded by (T(log(n + ||w|| + ||^4||)). 

Preconditioning and setup. 

Precondition A <— B — D1UAD2, where Di, D2 are 
random diagonal matrices, and U is a unimodular pre- 
conditioner from [23, §5]. While we will not do the 
detailed analysis here, selecting coefficients for these 
randomly from a set Si of size n 3 is sufficient to en- 
sure a high probability of success. This precondition- 
ing will ensure that all leading minors are non-singular 
and that the characteristic polynomial is squarcfree 
with high probability (see [5] Theorem 4.3 for a proof 
of the latter condition). From Theorem 2.1, we also 
see that ICm{B,u) has full rank with high probability. 

Let p be a prime which is larger than the a priori bound 
on the coefficients of the characteristic polynomial of 
A; this is easily determined to be (nlog ||^||) n+o(1) . 
Fix a blocking factor s to be optimized later, and as- 
sume n — ms. 

Choosing projections. Let « £ Z" xs be an efficient block 
projection as in (2.1) and v G Z nxs a random (dense) 
block projection with coefficients chosen from a set S2 
of size at least 2n 2 . 

Forming the sequence on = uA 1 v G Z sxs . 

Compute this sequence for i = . . . 2m. Computing 
all the A l v takes 0~(n(j>(n) ■ mlog ||j4||) bit operations. 
Computing all the uA x v takes 0~(n 2 ■ mlog||A||) bit 
operations. 



Computing the minimal matrix generator. 

The minimal matrix generator F(X) modulo p can be 
computed from the initial sequence segment Qo, . . . , Q2m 
See [20, §4]. This can be accomplished with (7(ms" • 
nlog \\A\\) bit operations. 

Extracting the determinant. 

Following the algorithm in [20, §4], we first check if 
its degree is less than n and if so, return "failure". 
Otherwise, we know det F A (X) = dct(AJ — A). Return 
detA = det.F(0) mod p. 

The correctness of the algorithm, and specifically the block 
projections, follows from fact that [u, Au, . . . , A m_1 w] is of 
full rank with high probability by Theorem 2.1. Since the 
projection v is dense, the analysis of [20, (2.6)] is applicable, 
and the minimal generating polynomial will have full degree 
m with high probability, and hence its determinant at A = 
will be the determinant of A. 

The total cost is (T {{n<f>{n)m + n 2 m + nms u ) log ||j4||) bit 
operations, which is minimized when s = n 1 /" . This yields 



an algorithm for the determinant which requires 0~((n 



2-1/u 



l log ||j4||) bit operations. This is probably most in- 



teresting when u) = 3, where it yields an algorithm for deter- 
minant which requires 0~(n 2 66 log ||j4||) bit operations on a 
matrix with pseudo-linear cost matrix- vector product. 

We also note that a similar approach allows us to use the 
Monte Carlo Smith form algorithm of [14], which is com- 
puted by means of computing the characteristic polynomial 
of random preconditionings of a matrix. This reduction is 
explored in [20] in the dense matrix setting. The upshot 
is that we obtain the Smith form with the same order of 
complexity, to within a poly-logarithmic factor, as we have 
obtained the determinant using the above techniques. See 
[20, §7.1] and [14] for details. We make no claim that this 
is practical in its present form. 

APPENDIX 

A. APPLYING THE INVERSE OF A BLOCK- 
HANKEL MATRIX 

In this appendix we address asymptotically fast techniques 
for computing a representation of the inverse of a block Han- 
kel matrix, for applying this inverse to an arbitrary matrix. 
The fundamental technique we will employ is to use the off- 
diagonal inversion formula of Beckermann & Labahn [1] and 
its fast variants [15]. An alternative to using the inversion 
formula would be to use the generalization of the Levinson- 
Durbin algorithm in [19]. 

For an integer m that divides n exactly with s = n/m, let 



H = 



eta 



ai 



Oil 



Ot2 



OL2n 



«m-l 



t»2m-2 
Q2m-1 



G F" 



(A.l) 

be a non-singular block-Hankel matrix whose blocks are sxs 
matrices over F, and let «2m be arbitrary in F sxs . We follow 
the lines of [21] for computing the inverse matrix H -1 . Since 
H is invertible, the following four linear systems (see [21, 



(3.8)-(3.11)]) 

H[q m - U --- ,*>]* = [(),••■ ,0,I] £ F" xs , 



H [v m , ■ ■ ■ , Vlf = — [dm,' • • Q2m-l«2m] G F" 



(A.2) 



and 



[9m- 1 

\v~ ■ 



■ q*o]H = [0 
v*]H = - [a„ 



I] G F s 



• a 2m -i «2m] G F sxn , 

(A.3) 

have unique solutions given by the qk,qt G F sxs , (for < 
k < m - 1), and the v fc , G F sxs (for 1 < k < m). Then 
we have (see [21, Theorem 3.1]): 



H- 1 = 



Vm-l 



Vl 
I 



Vl I 



9m-2 



90 





9o 



9o 

9m-l 



(A.4) 



The linear systems (A.2) and (A.3) may also be formulated 
in terms of matrix Pade approximation problems. We asso- 
ciate to H the matrix polynomial A = Yll^o a i xl e F sxs [a;]. 
The s x s matrix polynomials Q,P,Q*,P* in F sxs [a:] that 
satisfy 



A{x)Q{x) = P{x) + x 2 " 1 - 1 mod x 2m , 

where deg Q < m — 1 and deg P < m — 2, 

Q*{x)A(x) = P*(x) + x 2 ™- 1 mod x 2m , 

where deg Q* < m — 1 and deg P* < m — 2 



(A.5) 



are unique and provide the coefficients Q = "^^Lq 1 q%x l and 
Q* = 1i xl f° r constructing H^ 1 using (A.4) (see [21, 

Theorem 3.1]). The notation "mod x l " for i > denotes 
that the terms of degree i or higher are forgotten. The s x s 
matrix polynomials V, U, V * , U* in F sxs [:r] that satisfy 



A(x)V(x) = U(x) mod x 2m+1 , V(0) = /, 
where dog V < m and deg U < m — 1, 

V*^)^^) = U*(x) mod a; 2m+1 , V*(0) = /, 

where deg Q* < m — 1 and deg P* < m — 2, 



(A.6) 



are unique and provide the coefficients 1/ = 1 + fi^ 1 
and Q* = 1 + u*^ for (A.4). 

Using the matrix Pade formulation, the matrices Q, Q* , 
V, and V* may be computed using the cr-basis algorithm 
in [2], or its fast counterpart in [15, §2.2] that uses fast ma- 
trix multiplication. For solving (A.5), the a-basis algorithm 
with a = s(2m — 1) solves 



[A -I] 

[Q* T] 



Q 
P 

A 



Rx 2m - X mod x 2m , 



j-j* 2m — 1 i 2m 

= R x mod x 



with Q,P,Q ,P G F sxs [:r] that satisfy the degree con- 
straints degQ < m — l,degQ < m — 1, and degP < 
m — 2, deg P < m — 2. The residue matrices R and R* in 
F sxs are non-singular, hence QR^ 1 and (R*)^ 1 Q are so- 
lutions Q and Q for applying the inversion formula (A.4). 
For (A.6), the cr-basis algorithm with a — s(2m + 1) leads 
to 

" V ~ 
U 



[A -I] 



j 2m+l 

= moda: , 



[V* U } 



A 



= modx 2m+1 



with deg V < m, deg V < m, and deg U <m—l, deg U < 
m—1. The constant terms V(0) and V (0) in F sxs are non- 
singular, hence V = F(F(0))" 1 and V = (F^O))" 1 ^* 
are solutions for applying (A.4). Using Theorem 2.4 in [15] 
together with the above material we get the following cost 
estimate. 

Proposition A.l. Computing the expression (A.4) of the 
inverse of the block-Hankel matrix (A.l) reduces to multiply- 
ing matrix polynomials of degree 0(m) in F sxs , and can be 
done with CT(s u 'm) operations in F. 

Multiplying a block triangular Toeplitz or Hankel matrix 
in f nxn with blocks of size s x s by a matrix in F nxn reduces 
to the product of two matrix polynomials of degree O(m), 
and of dimensions s x s and s x n. Using the fast algorithms 
in [4] or [3], such a s x s product can be done in 0~(s bJ m) 
operations. By splitting the s x n matrix into s x s blocks, 
the sx s by sxn product can thus be done in 0~(m x s"m) = 
0~(s"m 2 ) operations. 

For n = s v let w(l, 1, v) be the exponent of the problem 
of s x s by s x n matrix multiplication over F. The splitting 
considered just above of the sxn matrix into s x s blocks, 
corresponds to taking w( 1,1,1/) = u) + v — 1 < v + 1.376 
(w < 2.376 due to [6]), with the total cost (T(s w{x '^ v) m) = 
0~(s^m 2 ). Depending on a > 1, a slightly smaller bound 
than v + 1.376 for w(l, 1, v) may be used due the matrix 
multiplication techniques specifically designed for rectangu- 
lar matrices in [16]. This is true as soon asf> 1.171, and 
gives for example w(l, l,v) < v + 1.334 for v = 2, i.e., for 
s = *Jn. 

Corollary A.2. Let H be the block-Hankel matrix of (A.l). 
If the representation (A.4) of H~ x is given then computing 
H~ 1 M for an arbitrary M G F nxn reduces to four s x s by 
sxn products of polynomial matrices of degree 0(m). This 
can be done with 0"(s tJ ' 1 ' 1 '"'m) or C r (s u 'm 2 ) operations in 
F (n = s v = ms). 
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